智能论文笔记

Recursive Deformable Image Registration Network with Mutual Attention

Jian-Qing Zheng , Ziyang Wang , Baoru Huang , Ngee Han Lim , Tonia Vincent , Bartlomiej W. Papiez

分类：计算机视觉

2022-06-04

可变形的图像注册，估计不同图像之间的空间转换，是医学成像中的重要任务。许多先前的研究都使用基于学习的方法进行多阶段注册来执行3D图像注册以提高性能。但是，多阶段方法的性能受到不受单个空间尺度上复杂运动的接收场的大小的限制。我们提出了一个新的注册网络，结合了递归网络体系结构和相互注意机制，以克服这些局限性。与最先进的深度学习方法相比，基于递归结构的我们的网络达到了肺计算机断层扫描（CT）数据集的最高精度（肺部的骰子分数为92 \％，肺平均表面距离为3.8mm ），这是腹部CT数据集中最准确的结果之一，具有9个大小的器官（骰子得分为55 \％，平均表面距离为7.8mm）。我们还表明，添加3个递归网络足以达到最新结果，而没有明显增加推理时间。

translated by 谷歌翻译

SUPER-Rec: SUrrounding Position-Enhanced Representation for Recommendation

Taejun Lim , Siqu Long , Josiah Poon , Soyeon Caren Han

分类：人工智能

2022-09-09

协作过滤问题通常是基于矩阵完成技术来解决的，该技术恢复了用户项目交互矩阵的缺失值。在矩阵中，额定位置专门表示给定的用户和额定值。以前的矩阵完成技术倾向于忽略矩阵中每个元素（用户，项目和评分）的位置，但主要关注用户和项目之间的语义相似性，以预测矩阵中缺少的值。本文提出了一种新颖的位置增强的用户/项目表示培训模型，用于推荐，Super-Rec。我们首先使用相对位置评级编码并存储位置增强的额定信息及其用户项目与嵌入的固定尺寸，而不会受矩阵大小影响。然后，我们将受过训练的位置增强用户和项目表示形式应用于最简单的传统机器学习模型，以突出我们表示模型的纯粹新颖性。我们对建议域中的位置增强项目表示形式进行了首次正式介绍和定量分析，并对我们的Super-Rec进行了原则性的讨论，以表现优于典型的协作过滤推荐任务，并具有明确的和隐式反馈。

translated by 谷歌翻译

K-MHaS: A Multi-label Hate Speech Detection Dataset in Korean Online News Comment

Jean Lee , Taejun Lim , Heejun Lee , Bogeun Jo , Yangsok Kim , Heegeun Yoon , Soyeon Caren Han

分类：自然语言处理 | 人工智能

2022-08-23

在线仇恨言语检测已随着数字设备的增长而变得重要，但是英语以外的其他语言资源非常有限。我们介绍了K-MHAS，这是一种新的多标签数据集，用于仇恨言语检测，可有效处理韩国语言模式。该数据集由新闻评论中的109k话语组成，并提供了从1到4个标签的多标签分类，并处理主观性和相交性。我们评估了K-MHAS上强的基线。Kr-Bert带有子字符的代币器优于表现，在每个仇恨言论类中都认识到分解的角色。

translated by 谷歌翻译

Jack and Masters of All Trades: One-Pass Learning of a Set of Model Sets from Foundation AI Models

Han Xiang Choong , Yew-Soon Ong , Abhishek Gupta , Ray Lim

分类：神经与进化计算

2022-05-02

对于深度学习，大小就是力量。经过广泛数据训练的大量神经网是人工智能的最前沿。这些基础模型或“所有行业的千斤顶”（JATS）（JATS）进行了微调，以进行下游任务，在推动深度学习进步方面变得重要。但是，具有严格资源限制的环境，目标和意图不断变化或任务要求各异，可能会限制单数JAT的实际实用程序。因此，本文与当前建立越来越大的Jats的趋势同时进行了对概念的初步探索，该概念是创建各种紧凑的机器学习模型集的基础。由许多较小和专业的模型组成，我们制定了一组集合，以同时满足许多任务设置和环境条件。首次提出了在神经进化多任务算法的一次传球中进行此类设置的一种手段，这使我们更接近了“所有行业的大师”的模型。

translated by 谷歌翻译

Bridging the gap between prostate radiology and pathology through machine learning

Indrani Bhattacharya , David S. Lim , Han Lin Aung , Xingchen Liu , Arun Seetharaman , Christian A. Kunder , Wei Shao , Simon J. C. Soerensen , Richard E. Fan , Pejman Ghanouni

分类：计算机视觉

2021-12-03

前列腺癌是美国男人的第二致致命癌症。虽然磁共振成像（MRI）越来越多地用于引导前列腺癌诊断的靶向活组织检查，但其效用仍然受到限制，因为假阳性和假否定的高率以及较低的读者协议。机器学习方法在前列腺MRI上检测和定位癌症可以帮助标准化放射科学诠释。然而，现有的机器学习方法不仅在模型架构中不等，而且还可以在用于模型培训的地面真理标签策略中。在这项研究中，我们比较不同的标记策略，即病理证实放射科标签，整个安装组织病理学图像上的病理学家标签，以及病变水平和像素级数字病理学家标签（先前验证了组织病理学图像上的深层学习算法以预测像素 - 整个安装组织病理学图像上的Gleason模式）。我们分析这些标签对训练有素的机器学习模型的性能的影响。我们的实验表明，用它们培训的（1）放射科标签和模型可能会错过癌症，或低估癌症程度，（2）与他们培训的数字病理学家标签和模型与病理学家标签有高度的一致性，而（3）用数字病理学家培训的模型标签在两种不同疾病分布的两种不同群组中达到最佳性能，而不管使用的模型建筑如何。数字病理学家标签可以减少与人类注释相关的挑战，包括劳动力，时间，和读者间变异性，并且可以通过使可靠的机器学习模型进行培训来检测和定位前列腺癌，帮助弥合前列腺放射学和病理学之间的差距在MRI。

translated by 谷歌翻译

KLUE: Korean Language Understanding Evaluation

Sungjoon Park , Jihyung Moon , Sungdong Kim , Won Ik Cho , Jiyoon Han , Jangwon Park , Chisung Song , Junseong Kim , Yongsook Song , Taehwan Oh

分类：自然语言处理

2021-05-20

我们介绍韩语了解评估（KLUE）基准。 Klue是8个韩国自然语言理解（nlu）任务的集合，包括主题分类，语言典的相似性，自然语言推断，命名实体识别，关系提取，依赖解析，机器阅读理解和对话状态跟踪。我们从各种源语料库中展开的所有任务，同时尊重版权，以确保任何没有任何限制的人的可访问性。考虑到道德考虑，我们仔细设计了注释协议。随着基准任务和数据，我们为每个任务提供适用的评估指标和微调配方，为每项任务进行预训练语言模型。我们还释放了预用的语言模型（PLM），Klue-Bert和Klue-Roberta，以帮助在KLUE上再现基线模型，从而促进未来的研究。我们通过拟议的Klue基准套件从初步实验中进行了一些有趣的观察，已经证明了这款新的基准套件的有用性。首先，我们找到了klue-roberta-mantring的其他基线，包括多语种plms和现有的开源韩国plms。其次，即使我们从预先预测语料库中取代个人身份信息，我们也会看到性能下降最小，这表明隐私和NLU能力并不彼此可能。最后，我们发现，使用BPE标记与语素级预象的组合，在涉及语素级标记，检测和发电的任务中是有效的。除了加速韩国人NLP研究外，我们的创建Klue的全面文件将有助于将来为其他语言创建类似的资源。 klue在https://klue-benchmark.com上提供。

translated by 谷歌翻译

TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models

Sucheng Ren , Fangyun Wei , Zheng Zhang , Han Hu

分类：计算机视觉

2023-01-03

Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.

translated by 谷歌翻译

Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation

Yue Han , Jiangning Zhang , Zhucun Xue , Chao Xu , Xintian Shen , Yabiao Wang , Chengjie Wang , Yong Liu , Xiangtai Li

分类：计算机视觉

2023-01-03

Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.

translated by 谷歌翻译

Muse: Text-To-Image Generation via Masked Generative Transformers

Huiwen Chang , Han Zhang , Jarred Barber , AJ Maschinot , Jose Lezama , Lu Jiang , Ming-Hsuan Yang , Kevin Murphy , William T. Freeman , Michael Rubinstein

分类：计算机视觉 | 人工智能 | 机器学习

2023-01-02

We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io

translated by 谷歌翻译

Conditional Diffusion Based on Discrete Graph Structures for Molecular Graph Generation

Han Huang , Leilei Sun , Bowen Du , Weifeng Lv

分类：机器学习

2023-01-01

Learning the underlying distribution of molecular graphs and generating high-fidelity samples is a fundamental research problem in drug discovery and material science. However, accurately modeling distribution and rapidly generating novel molecular graphs remain crucial and challenging goals. To accomplish these goals, we propose a novel Conditional Diffusion model based on discrete Graph Structures (CDGS) for molecular graph generation. Specifically, we construct a forward graph diffusion process on both graph structures and inherent features through stochastic differential equations (SDE) and derive discrete graph structures as the condition for reverse generative processes. We present a specialized hybrid graph noise prediction model that extracts the global context and the local node-edge dependency from intermediate graph states. We further utilize ordinary differential equation (ODE) solvers for efficient graph sampling, based on the semi-linear structure of the probability flow ODE. Experiments on diverse datasets validate the effectiveness of our framework. Particularly, the proposed method still generates high-quality molecular graphs in a limited number of steps.

translated by 谷歌翻译